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Abstract 

Introduction. Computer vision is widely used for semantic segmentation of Earth remote sensing (ERS) data. The method 
allows monitoring ecosystems, including aquatic ones. Algorithms that maintain the quality of semantic segmentation of 
ERS images are in demand, specifically, to identify areas with phytoplankton, where water blooms— the cause of 
suffocation — are possible. The objective of the study is to create an algorithm that processes satellite data as input 
information for the formation and checking of mathematical models of hydrodynamics, which are used to monitor the 
state of water bodies. Various algorithms for semantic segmentation are described in the literature. New research focuses 
on enhancing the reliability of recognition — often using neural networks. This approach is modified in the presented 
work. To develop the direction, a new set of information from open sources and synthetic data are proposed. They are 
aimed at improving the generalization ability of the model. For the first time, the contour area of the phytoplankton 
population is compared to the database — and thus the boundary conditions are formed for the implementation of 
mathematical models and the construction of boundary-adaptive grids. 

Materials and Methods. The set of remote sensing images was supplemented with the author's augmentation algorithm 
in Python. Computer vision segmented areas of phytoplankton populations in the images. The U-Net convolutional neural 
network (CNN) was trained on the basis of NVIDIA Tesla T4 computing accelerators. 

Results. To automate the detection of phytoplankton distribution areas, a computer vision algorithm based on the U-Net 
CNN was developed. The model was evaluated by the calculated values of the main quality metrics related to 
segmentation tasks. The following metric values were obtained: Precision = 0.89, Recall = 0.88, F1 = 0.87, Dice = 0.87, 
and IoU = 0.79. Graphical visualization of the results of CNN learning on the training and validation sets showed good 
quality of model learning. This is evidenced by small changes in the loss function at the end of training. The segmentation 
performed by the model turned out to be close to manual marking, which indicated the high quality of the proposed 
solution. The area of the segmented region of the phytoplankton population was calculated by the area of one pixel. The 
result obtained for the original image was 51202.5 (based on information about the number of pixels related to the bloom 
of blue-green algae). The corresponding result of the modeling was 51312. 

Discussion and Conclusion. The study expands theoretical and practical knowledge on the use of convolutional neural 
networks for semantic segmentation of space imagery data. Given the results of the work, it is possible to assess the 
potential for automating the process of semantic segmentation of remote sensing data to determine the boundaries of 
phytoplankton populations using artificial intelligence. The use of the proposed computer vision model to obtain contours 
of water bloom due to phytoplankton will provide for the creation of databases — the basis for environmental monitoring 
of water resources and predictive modeling of hydrobiological processes. 


Keywords: environmental monitoring of water resources, phytoplankton boundaries, water bloom contour, blue-green 
algae bloom, space image data segmentation 
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AHHOTalna 

Beedenue. Komibtotephoe 3peHve WIMpOKO HCHONL3yeTCA JIA CCMaHTHYeCKON CerMeHTalH JAaHHbIX JHCTAaHWMOHHOrO 
30H_upoBanua 3emmu (7133). MeToa Mo3B0MeT KOHTPOJMPOBaTb IKOCHCTeMBI, B TOM 4CcIIe BOAHbIe. BoctpedoBaHbl 
ayITOPHTMBI, OOecrieuMBarolliMe KaYeCTBO CCMaHTHYeCKOM CerMeHTallHu CHUMKOB J{33, B YaCTHOCTH, JIA BBIABIICHHA 00- 
JlacTeli c (PUTOMIAaHKTOHOM, re BO3MO2XHO WBeTeHHe BOJbI — UpH4aHHa 3amopos. [en uccieqoBaHud — co3faHue all- 
ropuTMa, OOpaGaTEIBarollero CIYTHHKOBbIe aHHble KaK BXOJHYIO MH(OpMallM1o Id (POpMUpoBaHuA U BepuduKalNu 
MaTeMaTHYeCKHX MOJeseH TUIPOAMHAaMUKH, IO KOTOPbIM OTCIeKUBACTCA COCTOAHHE BOAHBIX OObEKTOB. B WuTepatType 
OMMCaHbI pa3IM4Hble alITOpHTMbI CeMaHTHYecKol CerMeHTalHu. Hosble UccreqOBaHuA COCpeAOTOUeHEI Ha MOBBIINeCHHH 
HaJ@xKHOCTH pacio3HaBaHua — Yallje C TMOMOMbIO Helipocetei. ITOT MOAXOA COBepMIeHCTByeTCA B NpecTaBseHHoH 
padote. Ja pa3BuTua HallpaBlleHHa MpeyararoTca HOBbIM Habop CBeJeHHM U3 OTKPBITHIX HCTOUHUKOB MU CHHTeTM4eCcKHe 
TaHHBle Wd yryaenuad oOobmarollel clocoOHocTu Moyzenru. BiepBbie oOsacTb KOHTypa (PUTOMIAHKTOHHOM Monys4- 
IMM CpaBHMBaeTca C 6a30i TaHHbIX — HW Tak POPMUpyrOTCA rpaHW4Hble yCOBHA IA peau3alluHW MaTeMaTHYeCKHX MO- 
Teel W MOCTpoeHHA TpaHH4HO-aallTHBHBIX CeTOK. 

Mamepuaaoi u memoovi. Hadop cHumxos J{33 JOMONHMIIM C MOMOLIbIO ABTOPCKOTO ayrMeHTAaL|MOHHOYO asIrOpHTMa Ha 
a3bike Python. KommbroTepHoe 3peHve CerMeHTHpOBalO OOMAaCTH (PUTOMMAHKTOHHBIX MOMyIAWM Ha CHUMKax. CBepTou- 
HY!O HelipoHHyto ceTb (CHC) U-Net o6yaumu Ha 6a3e yckopuTerel BEraucineHuii NVIDIA Tesla T4. 

Pe3yravmamot uccnedoeanua. Jia aBtoMaTH3aluu OOHapyxKeHHA OOacTel pacnpoctpaHeHua (PUTOMIAHKTOHA pa3pa- 
O0TAaH aJITOpHTM KOMIbIOTePpHOLo 3peHHA, OCHOBaHHBIM Ha CHC U-Net. Mogenb oleHuH 10 BEIYHCIICHHBIM 3HadeHHAM 
OCHOBHBIX MeTPHK KauyecTBa, OTHOCAINMXcA K 3aqayamM cermMeHtaruu. IlomyyeHbi cieqyroluue 3HadeHHaA MeTPHK: 
Precision = 0,89, Recall = 0,88, Fl = 0,87, Dice = 0,87 u IoU = 0,79. [paduyeckas Bu3yamn3alua pesyibTaToB oOy4eHHA 
CHC ua o6yuarollieM H BaIMaltwOHHOM HaOopax Moka3asla Xopoliee KayecTBO OOy4eHHA Moe. OO 3TOM CBU eTEIb- 
CTBYIOT MaJIble W3MeHeHHA (YHKIWU MoTepb B KOHIe OOyaeHHA. BrmowHeHHad MOJeJIbIO CerMeHTAalIMA OKa3aslaCcb 
O1M3Ka K py4HOH pa3MeTKe, YTO TOBOPUT O BBICOKOM KayecTBe IpesO*KeHHOrO pemeHna. Ilo nIomWaqM ONHOTO WuKcelsA 
paccuHuTalH WI0lmlaqb CCOrMeHTHpOBaHHOH OOMacTu PUTOMMAHKTOHHON NonyranMn. omryyeHHE pesyibTaT Wa UCxo]- 
Horo w300paxeHua — 51202,5 (10 HHopMallMu O KOMYECTBe MMKCeeH, OTHOCAIIMXCA K IIBeETCHHIO CHHe-3eJICHBIX 
BoNopocien). CooTBeTCTBYIOINMM UTOr MoyemMpoBanua — 51312. 

Ooécyocoenue u 3akniovenue. UccneqoBaHne pacilinpseT TeopeTH4eckHe H MpakTH4eckHe 3HaHHA O IIpHMeHeHHH CBep- 
TOUHBIX HeEMPOHHBIX CeTel JIA CCOMaHTH4eCKOM CerMeHTallHH JaHHbIX KOCMHYCCKHX CHUMKOB. YYTHIBad HTOTH padorst, 
MO2XKHO OICHHTb MOTeHIMas ABTOMATH3allMM Tpollecca CeMaHTH4eCKOM CerMeHTalMN WaHHBIx J[33 Wa onpeneneHua 
TpaHull (PUTOMMAHKTOHHEIX MOMMYyIAWM C MOMOMIbIO HCKYCCTBeCHHOrO MHTeseKTa. IpumMeHeHuve IpeoxKeHHOM MOTeIu 
KOMIbIOTEPHOLO 3peHHA Jd MOJYYCHHA KOHTYPOB I[BCTCHHA BOJEI H3-3a PUTOMMAHKTOHA WO3BOJMT CO3aTb Oa3bl aH- 
HbIX — OCHOBY JIA IKONOIM4YECKOrO MOHMTOPHHTa BOJHBIX PecypcoB H MpOrHOCTHYeCKOrO MOJeIHpOBaHHA THApoOOHo- 
JIOTHYECKHX TIpOLWeccos. 
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Introduction. Automated algorithms for processing information received from satellites are needed in various 
fields of activity. Solving fundamental and applied problems of ecology requires segmentation of regions in accordance 
with the focus of attention of researchers. This optimizes the process of studying and modeling hydrobiological 
processes. An example of such local interest is the bloom of water due to the spread of phytoplankton. The phenomenon 
is important for current and complex monitoring of water resources. It is clearly visible from satellites during remote 
sensing of the Earth (ERS). 

Water bloom affects significantly its quality in surface sources used for domestic water supply systems [1]. The 
reaction of phytoplankton populations in the hydrological environment can reliably assess the general state of the aquatic 
ecosystem [2]. The negative consequences of uncontrolled algae growth are mass death of fish (suffocation), increased 
load on water purification plants [3], and pollution of shores and beaches [4]. 

Systematic measurements at automatic water quality monitors, as well as obtaining data from research expeditions, 
are labor-intensive and expensive activities. An additional source of information on the state of the phytoplankton 
community is modern satellite systems equipped with survey instruments. They allow remote recording of the state of the 
algae biomass, tracking its dynamics in a given time period. 

A significant advantage of satellite data as a tool for monitoring water resources is the possibility of full-scale and 
operational control at any point on Earth. A wide view of the water area, as a rule, gives researchers a significant amount 
of useful information. But, despite the active development of systems based on computer vision algorithms, the problem 
of identifying the contours of regions of interest in remote sensing data has not yet been fully solved. 

Good results are obtained by various algorithms of semantic segmentation on images. With their help, it is possible to 
identify and clarify the boundaries and structure of natural objects. In [5], the efficiency of the LBP method (local binary 
patterns) for recognizing objects consisting of curvilinear contours is shown. LBP provides high edge sharpness and detail 
of Earth satellite sensing data. In [6], it is noted that to increase the reliability of recognition, it is required to combine 
artificial intelligence algorithms and such classical methods of image edge detection as Sobel, Kirsch and Laplace 
operators. In [7], a comprehensive approach is proposed for semantic processing of satellite images of unlimited size 
using U-Net neural network models, which showed an F1-score value from 0.78 to 0.91 when detecting objects. 

Paper [8] provides an overview of intelligent methods for solving the problem of semantic segmentation of data on 
satellite images. The authors conclude that in this case, neural network algorithms are the most effective and productive. 
As an example, a convolutional neural network (CNN) is given, trained on several thousand satellite images of 
Massachusetts (USA). The accuracy of the model was 85.31%. In [9], semantic segmentation, instance segmentation, and 
panoptic segmentation are considered. The advantages of using deep learning methods implemented in the architectures 
of such CNN as SegNet, U-Net, and DeepLab are specified. In [10], automated processing of satellite images is based on 
a combination of the SpaceNet dataset and progress in computer vision which are made possible by deep learning. This 
paper presents five approaches based on improvements to the U-Net and Mask R-Convolutional Neural Networks models. 
The metric values for the best models are as follows: average precision (AP) and average recall (AR) are 0.937 and 0.959, 
respectively. An effective application of CNN for detecting contrails on satellite images is described in [11]. It is proven 
that in large-scale monitoring of contrails with measurement of their impact on climate, the approach based on CNN of 
U-Net architecture demonstrates Fl-score equal to 0.52, with an overall average detection probability of 0.51. 
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The 2023 models Segment Anything (SAM), Language-Segment-Anything (Lang-SAM), and HQ-SAM are of 
particular interest. These are dynamic deep learning tools that can predict object masks from images using input hints. 
Several researchers have already applied this approach to the analysis of aerial photographs and ERS data. The accuracy 


of identifying areas of interest has proven to be high [12]. In [13], the F1- score value reaches 86.5%+4.1%. In the future, 
models of similar architecture with various modifications (Polyp-SAM, Grounding DINO, etc.) will provide for both 
interactive (requiring user intervention) and automatic segmentation. 

Intelligent technologies are increasingly being introduced to process remote sensing data. High accuracy of models is 
noted. Particular attention is paid to methods based on such CNN as SegNet, U-Net, DeepLab in combination with 
classical methods of image preprocessing. A generalized approach to segmentation is actively developing. 

This paper considers the solution to a problem in the ERS data assimilation using computer vision. The application of 
U-Net CNN for segmentation of areas containing phytoplankton populations is shown. The algorithm created by the 
authors provides for the segmentation of regions of interest and calculation of their areas, which is required for further 
analysis when solving problems of hydrodynamics and hydrobiology. 

The following four points describe the scientific novelty of the presented study. 

1. A data set was formed from open sources. 

2. Synthetic data were generated to improve the generalizing ability of the model. For this purpose, the authors’ own 
augmentation algorithm was used to make the model more resistant to noise in practical use [14]. 

3. An intelligent model based on the U-Net CNN architecture was implemented in the high-level Python language. Its 
key hyperparameters were optimized using the Optuna library and checked on a test dataset. 

4. The areas of the found contour containing phytoplankton populations were compared to the existing database. In 
this way, boundary conditions have been formed for the subsequent implementation of mathematical models and the 
construction of boundary-adaptive grids. 

To achieve the set goal, it is required to solve a number of problems: 

— to prepare an ERS database containing regions of interest segments of water bloom; 

— to validate and describe the topology of the U-Ne SNS; 

— to perform data augmentation to create an extended representative set; 

— to implement, optimize, debug and test the CNN of U-Net architecture; 

— to determine values of key metrics of the model quality for segmentation; 

— to calculate the areas of the segmented contour given the scale of the original image. 

The theoretical significance of the study is due to the expansion of ideas about the possibilities of using computer vision 
technology in the field of water resources monitoring. The practical significance consists in the development of an applied cross- 
platform and scalable tool for analyzing remote sensing images to record regions of interest in aquatic ecosystems. 

Materials and Methods. For geospatial analysis, we use open-source software that is often applied to solve 
environmental problems [15]. 

The study is based on current satellite data. The authors focus on the state of water bodies during the bloom of blue- 
green algae. Analysis of this information allows: 

— predicting the volume and distribution of phytoplankton in the water area [16]; 

— checking physical and biological processes that determine the rate of phytoplankton growth and biomass 
accumulation [17]; 

— analyzing climate change based on the forecast of the dynamics of the bloom process [18]; 

— studying in detail the process of CO exchange between a water body and air [19]. 

To automate the process of detecting regions of phytoplankton populations and calculating their areas, it is proposed 
to develop a computer vision algorithm based on the U-Net CNN architecture. 

As a training sample for the deep learning algorithm, 20 space images of water bodies such as the Black, Caspian, 
Azov Seas, etc., were taken. The photos were obtained at different points on the earth's surface. 

The first step was to label the images to transform the information into a format that could be understood by the computer vision 
algorithm for performing the segmentation. There were two common approaches to providing annotations: 

— creating a pixel-level mask; 


— selecting polygon boundaries for the region of interest. 
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We have used the first option, where the pixel-level mask files represent regions of interest for the algorithm. The 
marked masks are files with the extension jpeg or png. The proportions correspond to the image they annotate. Figure | 
shows an example of the original image and its mask, where green indicates land, blue indicates water surface, and red 
indicates the region of the phytoplankton population. 


a) b) 
Fig. 1. Image mapping: a — original image; b — image mask 


To increase the number of images in the data set, we used the authors' augmentation code, supplemented with noise 
effects. When creating the extended data set, we used the following modifications of the original images: 

— rotation by an arbitrary angle; 

— display along the OX and OY axes; 

— cropping; 

— scaling; 

— color correction. 

All changes were made taking into account the noise that may appear on real images obtained through remote sensing, 
and were segmented using the developed algorithm. 

Let us note the advantage of the authors' algorithm for creating additional source data. Under conditions of a limited 
set of real images, the use of artificially created images for training will allow for a more fine-tuning of the developed 
model, optimizing its parameters and making it more resistant to distortion in practical application. 

The U-Net CNN architecture is designed to solve the problem of biomedical data segmentation. The determining 
factor in its selection is the relatively small size of the initial data, with which U-Net shows satisfactory results in practice. 

The U-Net CNN architecture is based on the interaction of convolution layers + pooling, which first reduce the spatial 
resolution of the image (encoder), and then increase it, having previously combined it with the image data and passed it 
through other convolution layers (decoder) (Fig. 2). 
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Fig. 2. Architecture of U-Net SNN 


The convolutional blocks of the decoder and encoder are linked by end-to-end connections, or skip connections. This 
solves the problem of vanishing gradient, which is a challenge for computer vision [20]. In this study, we used the encoder 
from the ResNet—50 neural network, pre-trained on the ImageNet dataset. 

To select the hyperparameters of the U-Net CNN that affect the architecture and training process, the Optuna library 
was used. This made it possible to automate the model tuning to achieve better results. 


Nformation Technology, Computer Science and Management 


287 


https://vestnik-donstu.ru 


288 


Belova YuvV, et al. Development of an Algorithm for Semantic Segmentation of Earth Remote Sensing Data ... 


Research Results. Table 1 shows the model parameters specified under training. 


Table 1 
Parameters for Training the U-Net Convolutional Neural Network 
No. Parameter Value 
1 | Number of images in training set 700 
2 | Number of images in validation set 200 
3 | Number of images in test sample 100 
4 | Batch size 10 
5 | Learning rate Ist-4 
6 | Overfitting detector Early stopping 
7 |Solver Adam 
The model was trained using optimization of the Dice loss function (2) based on the Dice coefficient (1). 
2X OY 
DSC = a (1) 
HeeiaeK4 2|x O 4 + Smooth (2) 


|X|+|Y|+ Smooth 

Here, X — a set of pixels defined during the mapping as a scope of a specific class; Y — a set of pixels assigned to a 
specific class according to the conclusions of the developed segmentation model. The Smooth coefficient is used to 
smooth the calculation result in the case when the values X and Y are close to zero. 

The Adam method for stochastic optimization was used to train the model. Early stopping was used as an overfitting 
detector. In machine learning, this is one of the most widely used regularization methods to prevent overfitting. The training 
process was performed on the basis of NVIDIA Tesla T4 computing accelerators, it was implemented in 100 epochs and 
took 55 minutes. 

Figure 3 shows the graph of CNN training on the training and validation sets. The OX axis shows the training epochs, 
and the OY axis shows the values of the loss function. Analyzing the graph, we can conclude that the quality of model 
training is good, since at the end of training on the training sample, small changes in the loss function are observed. 
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Fig. 3. Training U-Net CNN: — on training sample; — on validation sample 


When assessing the quality of segmentation models, the Dice coefficient and the metric of the degree of intersection 
between two bounding rectangles (Intersection over Union — IoU, Jaccard index), determined form the following 
formula, are used: 


loU =——_, (2) 


where X — a set of pixels defined under the mapping as a scope of a concrete class; Y — a set of pixels assigned to a 
concrete class according to the conclusions of the developed segmentation model. 
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Table 2 presents the values of per-pixel precision, recall, Fl-score, Dice coefficient, and IoU. To obtain the final IoU 
value, the weighted average is calculated for the values of this metric for each class. 


Table 2 
Results of Model Quality Assessment on the Test Sample 
Metric Precision Recall Fl Dice IoU 
Average value for test sample 0.89 0.88 0.87 0.87 0.79 


Figure 4 shows the results of the algorithm's work on segmenting regions of water resources, land and phytoplankton 
populations. The results obtained satisfy the tasks of water resource monitoring and have practical value. 


- 


d) 
D 


Fig. 4. Algorithm results for segmenting areas of water resources, land and phytoplankton populations: 
a, d — original image; b, e — manual mapping; c, f— model result 


e) 


The segmentation result in Figures 4 c and 4 fis visually close to manual mapping, which indicates the high quality 
of the model. The area of the segmented region of the phytoplankton population was calculated by estimating the area of 
one pixel. Each image provided has additional metadata indicating the image scale and its resolution. Based on this value, 
the area occupied by each pixel is calculated. In the case considered for Figure 4 a, the final value is 51202.5. This figure 
was obtained according to information on the number of pixels related to blue-green algae blooms from a set of segmented 
images of phytoplankton populations in coastal systems [21]. The calculation result for Figure 4 c is 51312. 
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Discussion and Conclusion. When assessing the state of water resources, computer vision and other machine learning 
algorithms allow specialists to become free from monotonous operations. They are performed by intelligent systems. In 
this case, monitoring can be carried out round-the-clock. The algorithm will adequately predict risks, model the 
development of situation, and support the adoption of operational decisions. Stored and replicated knowledge in the form 
of databases and registers can be used to create long-term sources of information that researchers can use to analyze the 
state of water bodies and build climate models. 

Processing ERS data in the form of semantic contours will provide verifying complex mathematical models through 
refining boundary and initial conditions, increasing the accuracy, speed and reliability of predictive modeling of 
hydrobiological processes. 
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Saneiennoli 6kKnad asmopos: 


¥O.B. Bestopa: nporpaMMuas pean3alua U TeCTHpOBaHWe MHTeWICKTyaIbHOTO aJITOpHTMa Ha OCHOBe TryOoKoro 
oOyyeHHA It OOpaOoTKH AaHHBbIX CIYTHHKOBBIX HaOJOeHHHM, aHasIM3 KayeCTBa aJITOpHTMa CerMeHTalMH 10 JaHHbIM 
HaTypHbIx HaOJOeHHH. 

V.®. Pa3sBeespa: mporpaMMHasd peamM3alua, oOyaeHHe HM OTIaKa MHTeWICKTyaIbHOrO aJIrOpHTMa Ha OCHOBe 
ryOokoro o6y4eHHsA AJIt OOpaOoOTKH AaHHbIX CIYTHMKOBBIX HaOJIOAeHHH, KOppeKTHPOBKa TeKCTa CTaTbH. 

E.O. PaxumGaegpa: cOop u mpeqoOpaboTKa obyyaroulero HaOopa AaHHBbIX, peau3alluaq Mpolecca ayrMeHTalHHu 
JaHHbIX, O*POpMJIeHHe Hay4HOH CTaTHH. 
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